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^ Abstract 

Signal identification in large-dimensional settings is a challenging problem in 
'TiJ biostatistics. Recently, the method of higher criticism (HC) was shown to be an 

effective means for determining appropriate decision thresholds. Here, we study 

HC from a false discovery rate (FDR) perspective. We show that the HC threshold 
^ may be viewed as an approximation to a natural class boimdary (CB) in two-class 

discriminant analysis which in turn is expressible as FDR threshold. We demonstrate 
' ' that in a rare-weak setting in the region of the phase space where signal identifi- 

^ cation is possible both thresholds are practicably indistinguishable, and thus HC 

thresholding is identical to using a simple local FDR cutoff. The relationship of the 
^ HC and CB thresholds and their properties are investigated both analytically and by 

^ simulations, and are further compared by application to foxir cancer gene expression 

^ c/^^ data sets. 
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1 Introduction 



Identification of sparse and weak signals in complex high-dimensional data is a chal- 
lenging statistical problem that has many important applications in fields as diverse as 
astronomy, finance, genetics, medicine, and proteomics. A typical biomedical task is the 
search for biomarkers using data from genome- wide association studies ( |Xie et al] 2011[ ). 
Signal identification is much more difficult than the closely related problem of signal 
detection. Whereas in detection we are concerned purely with the presence or absence of 
a signal, in identification we additionally seek to locate the signal. 

In a series of recent publications the method of "Higher Criticism" (HC) was power- 
fully advocated in settings with rare and weak features as an efficient means for signal 
detection ( [Donoho and Jin 2004| > as well as signal identification ( [Donoho and Jin [2008 



2009[ ). Originally, HC was introduced by Tukey (1976]) as an approach to multiple signifi 



cance testing using a second level test statistic computed from p-values. Importantly, in 



Donoho and Jin ( 2004 1 it was shown that HC provides a procedure that is optimal for 



signal detection in the sense that it achieves the best possible theoretical detection limit 
discovered earlier by Ingster ( 1999 ). Subsequently, HC was also employed in a threshold- 
ing procedure to determine relevant features for prediction. Again, it was demonstrated 
that the HC approach to signal identification outperforms other commonly employed 
selection strategies, in particular those based on false discovery rates ( [Donoho and Jin 
2008l|2009|. 



In 



Ahdesmaki and Strimmer (2010 1 the utility of HC for variable selection in classi- 



fication was confirmed but at the same time it was also empirically shown that in the 
signal identification problem controlling the false non-discovery rate is equivalent to the 
HC procedure. Furthermore, it was discovered by Jager and WelLner (2007 ) that HC is 
not unique in achieving the detection limit. Given the success of HC this raises questions 
about the fundamental principles that may underlie this approach. 

Here, we explore signal identification using the HC and false (non)-discovery ap- 
proaches, with the aim to provide a better understanding of HC as well as offering 
a simple explanation for HC's favorable performance. Specifically, we argue that the 
decision threshold provided by HC may also be viewed as an approximation to a nat- 
ural class boundary (CB) in classification which in turn is easy to understand from a 
false discovery rate perspective. In particular, in the rare-weak setting in the region of 
the phase space where identification is actually possible we show that the HC and CB 
threshold are nearly indistinguishable. 

The remainder of the paper is structured as follows. First, we provide a non-technical 
introduction to HC both on sample and population level. Second, we derive the ideal 
thresholds corresponding to HC and false discovery rate approaches, and explore their 
mutual relationships. Next, we investigate these thresholds in the rare-weak model and 
establish the near identity of HC and a natural CB threshold in the rare- weak identifi- 
cation setting. Finally, we demonstrate the validity of the theoretical considerations by 
simulation and by analyzing data from four gene expression experiments. 
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2 Higher Criticism 



In the following, we introduce the HC approach to signal identification, and discuss 
various properties of the HC threshold both from a sample and population point of view. 

2.1 Empirical HC threshold based on p-values 

We consider a situation with d observed test statistics 1/1, ... , i/^f . For each statistic we 
compute a corresponding p-value pi, . . . , Pd- The dimension d is potentially very large, 
as in many current applications in genomics or proteomics. 

The HC approach to signal identification then proceeds as follows: 

• First, by arranging the p-values from smallest to largest p(i), . . . , p^^j) the empirical 
distribution function of the p-values is obtained, 

F{x) = i/d for p(,) <x< p(;+i) 

with X G [0; 1], p(o) = and p(^d+i) = 1- 

• Second, the empirical HC objective function 

HC(.) = , (1) 



'F{x){l-F{x))/d 
is computed ponoho and Jin[|2008t|2009| |. 



• Third, the HC statistic HC is obtained as the maximum of the empirical HC 
objective 

HC* = maxHC(p(,)) = HC{x^^) . 

• Finally, the maximizing argument x^^ is taken as the HC decision threshold for 
signal identification. As shown in Fig. [l^ all p, < are considered "significant" 
and thus assumed to likely correspond to non-null cases. 

Informally, the empirical HC objective function HC{x) may be interpreted as z-scores 
constructed from p-values — recall that Var(f (x)) = F(x) (1 — F{x))/d. Indeed, it is 
precisely this second level assessment of p-values that was the original motivation for 



the HC approach (Tukey 1976| ) and that gave rise to its name "Higher Criticism". 



2.2 Population HC objective function and goodness-of-fit statistics 

By definition, p-values have a uniform LJ(0, 1) null distribution with Fq{x) = x. More- 
over, the marginal distribution of the p-values may be viewed as a two-component 
mixture 

F{x) = f]oFo{x) + {1 - t]o)FA{x) 
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Figure 1: a) Empirical HC decision threshold x^'^ obtained by maximizing the empirical 
HC objective, and b) Class boundary given by local FDR = 1/2 and its relationship 
to the neighboring local FDR and local FNDR thresholds. 
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Table 1: Relationship of HC statistic with other goodness-of-fit statistics. 



Supremum 



Expectation 



Not standardized Kolmogorov-Smirnov: Cramer-von Mises: 

\Fa{x) - Fo{x)\ Ef{(fA(X)-Fo(X))2} 



sup. 



Standardized 



sup 



Higher Criticism: 

|F^(x)-Fo(x)| 



Anderson-Darling: 

\ F(X)(1-F(X)) / 



of the null model Fq (^) and an alternative model (x) where //o G [0; 1] is the proportion 
of the null. With this in mind the squared empirical HC objective function can be written 
as 



HC(x) 



(F^(x)-foW) 



2 



f(x)(l-F(x)) 

The proportionality factor d{l — rjo)'^ has been left out as it does not depend on x and 
hence is irrelevant for determining the decision threshold x^'-. Thus, for maximization 
we can use the above formula rather than Eq.jlj Furthermore, it has the advantage of 
immediately generalizing to the population level (i.e. to d — )■ oo) 

^^^""^ F(x)(l-F(x)) 

which greatly facilitates the conceptual understanding of the HC approach. 

The func t ion E q. |2] is well known from the goodness-of-fit statistic of Anderson 



and Darling (1954) which is proportional to the expectation Ef (HC(X)^). Hence, the 



HC statistic bears the same relationship to the Anderson-Darling statistic as does the 
Kolmogorov-Smirnov statistic to the Cramer-van Mises statistic ( [Darling [1957 1. More- 



over, as can be seen in Tab.[T]the HC statistic is the standardized Kolmogorov-Smirnov 
(KS) statistic. In fact, the KS statistic may used in the same fashion as HC to derive a 
decision threshold x^^. 

In the mixture model for p-values it is commonly assumed (see also Section 3 on 
false discovery rates) that F^(x) > Fo(x) for all x, i.e. that the the alternative component 
is stochastically smaller than or equal to the null component. Thus, on population level 
(though not on sample level) we may leave out the absolute value signs in the first 
column of Tab. [ll 

2.3 Invariance of HC objective function 

By inspection of Eq. |2] we can derive a number of interesting properties of the HC 
objective function. 
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First, it is completely symmetric with regard to the two components in the underlying 
mixture model for the p-values. The alternative model Fa and the null model Fq pl^y 
the same role in Eq. |2] 

Second, for computing the HC objective it is not necessary to explicitly specify the 
null proportion f/o- 

Third, Eq.|2]is invariant against transformation of the underlying test statistic. This 
can be seen as follows: Under a change of variables from x toy = y{x) the distribution 
fimction changes according to 



f J for increasing x{y), and 

1 — f ^x(i/)^ for a decreasing transformation. 



Applied to Eq.|2]this leads to 



HCiyf = HC{y{x)f <x 



(fl(y)-fo^(y))^ 

f^(y)(l-f^(y)) 



Remarkably, the HC objective function Eq.|2]retains its functional form under a change 
of variables. Thus, Eq.|2]is not constrained to p-values only and may instead be applied 
to any test statistic y without the need of prior conversion to the p-value scale. The HC 
decision threshold as the location of the maximum of Eq.|2] transforms accordingly, from 



x^'^ to 1/^^ = 



3 False Discovery Rates 

For comparison we now briefly recapitulate the "False Discovery Rate" (FDR) approach 
to signal identification. Like the HC approach it is also best understood on the population 
level. For comprehensive overview see, e.g., Efron (2008|. 



3.1 Definition of FDR and related quantities 

Essentially, there are two variants of FDR criteria, one based on distributions (tail area- 
based FDR) and the other on densities (local FDR). In addition, if the roles of null and 
alternative are interchanged one arrives at the "False Non-Discovery Rate" (FNDR). 
On the p-value scale, the tail-area-based FDR (or Fdr) is defined as 

Fdr(x) = Pr("null"|X < x) = = ^ . 

By construction, Fdr(x) is the proportion of p-values from the null component found 
among all p-values smaller than x. In order for Fdr(;c) to be monotonically increasing 
with X (i.e. to ensure that the ordering of test statistics does not change) it is necessary 
that /a (^) is a monotonically decreasing density, and thus both {x) and F{x) must be 
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assumed to be concave (Langaas et alj 2005 Strimmer 2008b[ |. This also implies that 
the alternative and null are stochastically ordered with Fa{x) > fb(^) for all x. The 



empirical estimate of Fdr for a set ordered p-values . . . , p^^j) is the rule of Benjamini 
[and Hochbergl ( [T995l l, 

FiPii)) ^ 

which also shows that Fdr may be viewed as a multiplicity-adjusted p-value. As com- 
plementary error one also studies the tail-area based FNDR that is the proportion of 
non-null p- values among p-values larger than x. On the p-value scale it is defined as 



Fndr(x) = Pr("alternative"|X > x) = {1 - rjo) 



Fx) 



Fndr and Fdr play a similar role as sensitivity and specificity in classical testing (Gen 
ovese and Wassermann[|2002| . 

Local FDR (fdr) is a density-based quantity defined as the probability of the null 
under the observed data. 



fdr(x) =Pr("null"|X = ;c) 



Vofo{x) ^ J]o_ 
fix) fix) 



(3) 



with fix) = //o/o(x) + (1 — //o)/a(^)- As with Fdr, to ensure that the local FDR is 
increasing with x the density /a(^) is assumed to be monotonically decreasing. The 
local FNDR is the probability of the alternative under the observed data, and is thus is 
given by 

fndr(x) = 1 - fdr(x) . 

There is also a direct relationship between fdr and Fdr. As can be seen from its 
definition Fdr is a conditional average of fdr. Hence, for monotonic /a(^) we find 
Fdr(x) < fdr(x) for all x ( |Efron[|2008 [). 

Like the HC objective function, fdr and Fdr are scalars and thus transform under a 
change of coordinates from x to 1/ as fdr(i/) = fdr(x(i/)) and Fdr(i/) = Fdr(x(i/)). 



3.2 Signal identification with FDR and FNDR 

A standard approach to o btain a decision threshold with FDR is to refer to the rule of 
Benjamini and Hochberg ( 1995 1 with a cutoff such as Fdr (p(,) ) < 0.05. Alternatively, a 

threshold may be found by controlling local FDR, for instance by requiring fdr (p(,) ) < 
0.2 (e.g. Efron[ |2008| |. This ensures that the identified features are mostly from the 
alternative with only little contamination by unwanted null features. Conversely, if 
the interest is to identify true null features then similar thresholds may be imposed on 
FNDR rather than FDR ( [Ahdesmaki and Strimmer||2010| |. 

This illustrated for local FDR and local FNDR in Fig. [l]b where the signal space is 
divided by the decision thresholds x^'^'^ and x^^'^^ into three distinct zones corresponding 
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to areas where one is very sure about membership to the null (local FNDR < 0.2 or 
local FDR > 0.8) or to the alternative (local FDR < 0.2) and one additional intermediate 
region. 

From a classification perspective there exists another threshold — the class boundary 
x^^ — that provides a natural separation between null and non-null components. At 
^CB |.]-jg probabilities of membership to the alternative and to the null are equal to 1 /2. 
Hence, in terms of local FDR we have 

fdr(xC«) = fndr(xCi5) = 1 . 

As can be seen in Fig. |l]t» by construction x'-^ is located inbetween x^"'^'^ and x^'^^. From 
the definition fdr(x*--^) = 1/2 and Eq.jsjwe obtain the condition 

^o/o(^^'^) = (1-'7o)/a(xC') (4) 

for the CB threshold. 



4 Comparison of CB and HC decision thresholds 

It is now instructive to study the mutual connections among the various decision thresh- 
olds, in particular among x^^, x^^, and x^^. 



4.1 Kolmogorov-Smirnov (KS) decision threshold 

The location x^^ where the Kolmogorov-Smirnov objective function |Fa(^) ~ Fo{x) \ is 
maximized is given by 

Mx''^) = fAix""') . (5) 

Thus, the KS decision threshold coincides with the class boundary x^^ if = 1/2. Thus, 
the KS threshold implicitly assumes that null and non-null components have the same 
prior probability. 



4.2 HC decision threshold 

Using Eq. |2] we may determine the population decision threshold that one tries to 
estimate by maximizing the empirical HC objective HC{x). This leads to the general 
condition 

/o {2F(1 - F) + {Fa - Fo) (1 - 2f ) i^o} = 
fA {2F(1 - F) - {Fa - Fo)(l - 2F) (1 - ?/o)} 

that must be satisfied by the HC decision threshold x^^ (note that in Eq.[6]the arguments 
to F, Fo and Fa have been left out for the sake of clarity). 
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There are two cases when the HC threshold condition simplifies substantially. First, 
if the null and alternative components are well separated: then f^(x^'-) = 1 and 
Fo{x^^) = and consequently F{x^^) = 1 — rjo so that Eq.[6]reduces to 

,/o/o(x«^) = (1-?/o)/a(^"^). 

Thus, for well-separated null and alternative the HC threshold is identical to the CB 
threshold. 

Second, if null and alternative components are very close: then Fa{x^'^) ~ Fo{x^^) 
and Eq. [6]becomes 

i.e. the HC threshold becomes identical to the KS threshold. 

Hence, the HC threshold may be viewed as a compromise between the CB threshold 
and the KS threshold. This is directly observed in the study of the "rare-weak" model 
(cf. Tab.|2k). 



5 Rare weak model 

The use of "Higher Criticism" is particularly advocated in settings where the signal is 
sparse and weak. This situation is described by the so-called "rare weak" (RW) model 
that has been used to study the performance of HC. In the following we introduce the 
RW model and compare corresponding decision thresholds. 

5.1 Setup of RW model 

The RW model is a sparse normal mean mixture model with 

Z~ (l-e)N(0,l)+eN(T,l). (7) 

Its two parameters t S [0; oo] and e S [0; 1] describe intensity and sparsity of the signal. 
If e is small then the non-null features are rare, and likewise if r is small then the effect 
size is weak (hence the name of the model). From this mixture we observe z-scores 
Zi, . . . ,z^f, which provide the data from which decision thresholds are inferred. 

Despite its simplicity, this model is sufficiently rich to study the behavior of signal 



detection and signal identification methods ( Ingster[ 1999 Donoho and Jin 2004[ 2008 



2009} IXieetaTI [20TT) [Ji and Jinj [20T2] |. A generalized RW model with an additional 



variance parameter in the alternative is discussed injCaTetaT ( 2011[ |. 



A typical scenario where the RW model naturally arises is in classification. For 
example consider a two class setting with means fi^ = ^ and ^2 = ~F where fi = 
(. . . , }io, . . . , 0, . . .)^ is a d-dimensional vector containing either or fiQ as components 
and with e describing the proportion of non-zero entries. Further assume an identity 
covariance 1^ and equal number of observations tii = n2 = n/2 from the two classes. 
Then the corr esponding z-score vector (1/ni + l/n2)^^^^{fi^ — fi2) used for variable 
selection (e.g., Zuber and Strimmer 2009| simplifies to z = ^/nfi. The d components of z 
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follow the RW model of Eq. [Tjwith t = 1/ n/^o- Note the confounding of n and }Iq, so a 
small number of observations n and large }Iq gives rise to the same RW model as large 
sample size and small 

Instead of e and t it is sometimes convenient to use the alternative parameterization 

/5, = -log(e)/log(d) 

and 

t2 



2 //log(d) 

with corresponding backtransformations = d^^ and t,- = y^2r\og{d). 

The motivation to use j6 instead of e to measure sparsity is that for d observations 
the smallest possible fraction of the alternative is 1/d. The change of variables maps 
e G [^; 1] to /3 S [0; 1]. A sparse setting in the RW model is characterized by jf3 G [|, 1] 
or equivalently e < d^^^'^. Similarly, the alternative intensity parameter is a map of 
T G [0; log((i)] to r G [0; 1]. As for d observed z-scores their maximum is bounded in 
expectation by y^21og(£f), a RW model with r > 1 contains comparatively well-separated 
null and alternative components whereas in a model with r < 1 the signal is weak. 

5.2 Decision boundaries for the RW model 

The RW model is simple enough to allow analytical calculations of some decision 
boundaries. 

Using the null and alternative densities /o(z) = ^=e^^ ^■^andfA{z) = ^=6^^^^'^) ''^ 



and distribution functions Fo(z) = 0(z) and F/^{z) = 0(z — r) the KS decision threshold 
(Eq. |5]l for the RW model is 

_KS _ I 
~ 2" 

Similarly, the classification class boundary (Eq. [Ijl simplifies for the RW model to 

CR T" 1 , / 1 — e 



2 T ° V ^ 

For e = 1/2 the CB threshold reduces to the KS threshold and for e < 1/2 we have 
^CB y ^KS_ Pqj. fixed e and the effect size t large enough the second term above also 
vanishes and hence also leads to the KS threshold. As the proportion of non-null features 
becomes smaller (e — ?■ 0) the decision threshold moves to infinity (z^^ — > 00). Thus, if 
e = no feature will be classified as non-null. 

For the HC decision threshold unfortunately no analytic expression for is avail- 
able. From the general considerations above (cf. Section we know that for larger 
T the HC threshold approximates the CB threshold, and that both reduce to the KS 



threshold for e = 1/2. Furthermore, Donoho and Jin (2009, Appendix Eq. 1.1) show that 
for the RW model fdr(z^^) > 1/2. This together with the monotonicity of the local FDR 
in the RW model implies that 



^HC < ^CB 
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Thus, in general using the HC decision threshold causes the inclusion of more features 
than using the CB threshold. 

Of particular interest is the behavior of the HC threshold for small values of e. 
Specifically, if e = and t is finite then the HC threshold is also finite. For example, 
e = and t = 2 leads to z^^ 3.35, which is distinctly different from the natural class 
boundary z^^ — )■ oo. Thus, by construction the HC criterion (and also the KS threshold) 
encourages false positives in signal identification. 

A comparison of the KS, HC, and CB thresholds for some settings of e and T is given 
in Tab. |2^. As expected, with increasing t the HC and CB thresholds become very similar 
and for e = 1/2 both HC and the CB threshold reduce to the KS threshold. Thus, the 
pattern confirms the general relationships of these decision thresholds discussed above. 

In addition, in the RW model there exist a further close link between the HC and 
CB thresholds. This results from the special structure of the parameter space of the RW 
model discussed next. 



5.3 Phase space of the RW model 

Within the RW model the behavior of signal detection and identification procedures 
have been studied extensively. This has lead to the remarkable insight that there exist 
several fundamental boundaries in its phase space that give rise to four distinct regions, 
as illustrated in Fig. |2^. 

[Ingster ( 1999[ l discovered the detection boundary 



''detect 



-yT^)2 /3G[|;1]. 



Below this boundary lies the "undetectable" region in which even signal detection is 
impossible, i.e. no method is able to decide whether e 7^ 0. Conversely, above the 
detection boimdary it is possible to consistently estimate e ( |Cai et al] 2007[ >. 
Donoho and Jin ( |2004| ) report the identification boundary 



ndentiP) = |6- 

It is only above this boundary in the "estimable" and "recoverable" regions that signal 
identification by thresholding is actually possible. In terms of original parameters this 
corresponds to the conditions t > 1^— 21og(e) or e > exp(— t^/2). Directly below 
this boundary lies the "detectable" region where detection of a signal is possible but 
not identification. This shows that signal identification is more difficult than signal 
detection. 

Finally, Xie et al. | 2011[ ) and Ji and Jin ( 2012| demonstrated the existence of the recovery 



boundary 

TrecoviP) = (1 + a/W)' 

above which in the "recoverable" region almost all signal can be completely identified. 
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Table 2: a) Comparison of the KS, HC and CB decision thresholds in the RW model, 
and b) Analysis of four cancer gene expression data sets with shrinkage discriminant 
analysis. 



a) Comparison of Thresholds 



Setting 






zCB 


T = 2 








£ = 


1 


3.3514 


00 


e = 0.001 


1 


3.0707 


4.4534 


£ = 0.01 


1 


2.5203 


3.2976 


£ = 0.1 


1 


1.7574 


2.0986 


£ = 0.5* 


1 


1.0000 


1 


T = 4 








£ = 


2 


3.3514 


00 


£ = 0.001* 


2 


3.6377 


3.7267 


£ = 0.01* 


2 


3.0965 


3.1488 


£ = 0.1* 


2 


2.5268 


2.5493 


£ = 0.5* 


2 


2.0000 


2 



T = b 
e = 
e = 0.001* 

£ = 0.01* 

e = 0.1* 
£ = 0.5* 



3 
3 
3 
3 
3 



8.1607 
4.1454 
3.7631 
3.3652 
3.0000 



00 

4.1511 
3.7659 
3.3662 
3 



* Signal identification is poss ible as e > 
exp(— T^/2), see Section 5.3 



b) Cancer Gene Expression Data 



Data / 
Method 



Prediction Error 



Selected 
Variables 



Prostate (d = 6033, n = 102, K = 2) 




CB 0.0637 


(0.0053) 


115 


HC 0.0497 


(0.0045) 


116 


FNDR 0.0550 


(0.0048) 


131 


Lymphoma {d = 


4026, n = 62,K = 


= 3) 


CB 0.0211 


(0.0042) 


178 


HC 0.0000 


(0.0000) 


345 


FNDR 0.0036 


(0.0018) 


392 


SRBCT {d = 2308, n = 63, X = 4) 




CB 0.0000 


(0.0000) 


88 


HC 0.0007 


(0.0007) 


174 


FNDR 0.0000 


(0.0000) 


89 


Brain {d = 5597, n =42,K = 5) 




CB 0.1633 


(0.0120) 


78 


HC 0.1417 


(0.0108) 


131 


FNDR 0.1525 


(0.0120) 


102 



K: number of classes in the response 
variable. 
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a) Phase Space of Rare-Weak Model 







recoverable 


estimable 








detectable 








undetectable 



0.7 O.i 



0.5 



0.6 



0.9 



1.0 



b) Ratio of HC and CB Thresholds at the Signal Identification Boundary and Above 




1e-05 



1e-04 



1e-03 



1e-02 



1e-01 



Figure 2: a) Phase space of the RW model following Xie et al. ( |2011| and Ji and Jin ( 2012[ ). 
The bold line shows the signal identification boundary ?'ident(/^) = above which signal 



identification is possible. For details on the four regions see the description in Section 5.3 
b) Ratio of and x*--^ thresholds at the signal identification boundary (solid line) and 
above (dotted lines). Note that Tident(^) = -\/— 21og(e). 
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5.4 HC threshold as approximation of the natural class boundary 

When comparing the KS, HC and CB decision thresholds in Tab. a striking phe- 
nomenon can be observed: whenever signal identification is possible, i.e. if e > 
exp(— T^/2), then then z^^ and z^^ are very similar. 

To investigate this further we computed the ratio of the HC and CB threshold directly 
at the signal identification boundary, and above (Fig.|2]b). Already at the boundary this 
ratio is close to 1, especially for small values of e. Moving further into the "estimable" 
and "recoverable" regions the differences between the two thresholds become negligible. 

Hence, in the RW model in the area where signal identification is possible 
and z^^ are in the worst case very similar and mostly indistinguishable for practical 
purposes. 

6 Data examples 

To further study the relationship among the HC, CB, and FNDR decision thresholds we 
analyzed both simulated as well as experimental data. 

6.1 Synthetic data 

We simulated data from the RW model at the signal identification boundary and above, 
as follows: 

1. We sampled d = 10,000 z-scores from the mixture model Eq.[7]with e = 0.01 and 
T G {3,4,5,6}. For t = 3 this is is a sparse and weak scenario located directly at 
the signal identification boundary (e ^ exp(— t^/2)). 

2. From the test statistics zi, . . . ,Z(i we computed p-values according to p, = 1 — 

3. Subsequently, the empirical HC threshold was obtained by maximization of Eq. [l] 

4. In addition, local FDR was estimated using the f drtool algorithm fS tri mmer] 
2008a.b) and correspondingly the CB (local FDR = 0.5) and FNDR (local FDR = 0.8) 
decision thresholds were identified. 

5. For each of the three investigated thresholds (HC, CB, FNDR) the number of false 
positives (FP), false negatives (FN), true positives (TP) and true negatives (TN) 
were determined. 

6. The simulations were repeated B = 1000 times to estimate mean errors and their 
standard deviations. 

The results are visualized in Fig. |3] As expected, the HC and CB thresholds yield 
similar results with growing r. However, if the signal is weak (small t) signal identifica- 
tion with HC leads to many more more false positives, and in addition the variability 
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FP FN Total Error = FP+FN 




Figure 3: Comparison of errors when using the HC, CB, and FNDR decision thresholds 
on data simulated from the RW model located directly at the detection boundary (e = 
0.01 and t = 3) and above (t > 3). 



of the error rates for HC is very large. Conversely, in this situation the CB threshold is 
more cautious and thus results in more false negatives. For all settings the error rates of 
HC are found in between those of CB and FNDR. Interestingly, the total error (FP+FN) 
is smallest when using the CB threshold. 

We also repeated this study with other sparsity settings e > 0.01. The resulting error 
plots all show exactly the same pattern of convergence of the CB and and HC methods 
as Fig.|3] 



6.2 Gene expression data 

Next, we also analyzed four clinical gene expression data sets related to prostate cancer 



(Singh et al. 20021 lymphoma ( ,Alizadeh et al.[ |2000), small ro und blue cell tumors 
(SRBCT) ( Khan et al.[ 2001 1, and brain cancer (Pomeroy et al. 20021. Previously, in 
Ahdesmaki and Strimmer ( 2010 1 we have compared the relative effectiveness of the 
FNDR and HC thresholds to select relevant genes in shrinkage discriminant analysis 
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using CAT scores ( |Zuber and Strimmer 2009[ ). 

In Tab. |2]b we show in addition the estimated prediction error and the number of 
selected variables for the CB threshold. Generally, using the CB decision threshold leads 
to the smallest predictor sets. Except for the prostate data the number of selected genes 
is roughly half compared to using the HC threshold as criterion. As the predictor error 
is only slightly increased we conclude that most of the additionally included predictors 
by HC are false positives. 

For practical analysis of gene expression data this implies that using x^^ yields — 
in comparison with — smaller and hence more interpretable predictor gene sets 
without compromising prediction error. 



7 Discussion 

Our investigation of the relationship of the HC and FDR methods started with the 
aim to better understand HC as a method for signal identification. In the context of 
variable selection for classification |Donoho and Jin ( 2008 ) demonstrated empirically 



that using as a decision threshold outperforms competing procedures, in particular 
those using a threshold based on FDR. [Donoho and Jin (2009 1 further justified HC as a 



signal identification procedure by showing that x^^ minimizes an approximation to the 
missclassification error. 

Here, we argue that the HC decision threshold may also be viewed as an approx- 
imation of the natural class boundary between the null and alternative groups in the 
RW mixture model. This CB threshold can be directly expressed in terms of local FDR 
and local FNDR. Importantly, in the RW model in the region of the phase space where 
signal identification is possible both thresholds are either very similar or practically 
indistinguishable. Interestingly, computing this threshold via HC uses only distribution 
functions (f , Fq, and F^, cf. Eq.|2]l but in addition requires optimization, whereas compu- 
tation via local FDR is direct but employs densities (/, /o, and /a, cf. Eq. |3]l which are 
more difficult to obtain. 

If the two thresholds are notably different then using the HC threshold leads to the 
inclusion of more false positives, and conversely the CB threshold yields a more compact 
feature set but with slightly increased prediction error. In short, the CB threshold is more 
cautious than the HC threshold (and the FNDR threshold). 

Hence, our study provides further support to the excellent performance of HC for 



signal identification. However, our conclusions are different from that of Donoho and 



Jin (2008 2009). First, we show that false discovery rates, properly applied, are indeed 



perfectly useful for signal identification, which has been disputed earlier. Second, the 
convergence of the CB and HC thresholds in the "estimable" and "recoverable" regions 
indicates that this is what HC is actually approximating. 

In general, estimation of the CB threshold is a challenging problem as this requires 
the fit of a mixture model and estimation of the mixing density. In contrast, the empirical 
HC threshold can readily be determined using p-values computed from Fq alone. Thus, 
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for signal identification the HC approach provides a simple yet effective means to 
approximate the CB threshold. 
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